Decision Tree Induction : How E ective is the Greedy Heuristic ?

نویسندگان

  • Sreerama Murthy
  • Steven Salzberg
چکیده

Most existing decision tree systems use a greedy approach to induce trees | locally optimal splits are induced at every node of the tree. Although the greedy approach is suboptimal, it is believed to produce reasonably good trees. In the current work, we attempt to verify this belief. We quantify the goodness of greedy tree induction empirically, using the popular decision tree algorithms, C4.5 and CART. We induce decision trees on thousands of synthetic data sets and compare them to the corresponding optimal trees, which in turn are found using a novel map coloring idea. We measure the eeect on greedy induction of variables such as the underlying concept complexity, training set size, noise and dimensionality. Our experiments show, among other things, that the expected classii-cation cost of a greedily induced tree is consistently very close to that of the optimal tree.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision Tree Induction: How Effective is the Greedy Heuristic?

Most existing decision tree systems use a greedy approach to induce trees -locally optimal splits are induced at every node of the tree. Although the greedy approach is suboptimal, it is believed to produce reasonably good trees. In the current work, we attempt to verify this belief. We quantify the goodness of greedy tree induction empirically, using the popular decision tree algorithms, C4.5 ...

متن کامل

Decision Tree Induction Systems: A Bayesian Analysis

Decision tree induction systems are being used for knowledge acquisition. Yet they have been developed without proper regard for the subjective Bayesian theory of inductive inference. This paper examines the problem tackled by these systems from the Bayesian view in order to interpret the systems and the heuristic methods they use. It is shown that decision tree systems depart from the usual Ba...

متن کامل

A New Decision Tree Induction Using Composite Splitting Criterion

C4.5 algorithm is the most widely used algorithm in the decision trees so far and obviously the most popular heuristic function is gain ratio. This heuristic function has a serious disadvantage – towards dealing with irrelevant featured data sources. The hill climbing is a machine learning technique used in searching. It has good searching mechanism. Considering the relationship between hill cl...

متن کامل

Finding Optimal Multi-Splits for Numerical Attributes in Decision Tree Learning

Handling continuous attribute ranges remains a deeciency of top-down induction of decision trees. They require special treatment and do not t the learning scheme as well as one could hope for. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. This topic has attracted abundant attention in recent years. In particular , Fayyad and Irani showed how opt...

متن کامل

Evolutionary model trees for handling continuous classes in machine learning

Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995